AITopics | distillation dataset

Collaborating Authors

distillation dataset

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

Does Knowledge Distillation Really Work? Samuel Stanton NYU Pavel Izmailov NYU Polina Kirichenko NYU Alexander A. Alemi Google Research Andrew Gordon Wilson NYU

Neural Information Processing SystemsApr-25-2026, 11:38:11 GMT

Knowledge distillation is a popular technique for training a small student network to emulate a larger teacher model, such as an ensemble of networks. We show that while knowledge distillation can improve student generalization, it does not typically work as it is commonly understood: there often remains a surprisingly large discrepancy between the predictive distributions of the teacher and the student, even in cases when the student has the capacity to perfectly match the teacher. We identify difficulties in optimization as a key reason for why the student is unable to match the teacher. We also show how the details of the dataset used for distillation play a role in how closely the student matches the teacher -- and that more closely matching the teacher paradoxically does not always lead to better student generalization.

artificial intelligence, machine learning, natural language, (19 more...)

Neural Information Processing Systems

Country: North America > United States (1.00)

Genre: Research Report > New Finding (0.46)

Industry:

Education (1.00)
Government > Regional Government > North America Government > United States Government (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.69)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)

Add feedback

Reinforcement Learning Teachers of Test Time Scaling

Cetin, Edoardo, Zhao, Tianyu, Tang, Yujin

arXiv.org Artificial IntelligenceOct-30-2025

Training reasoning language models (LMs) with reinforcement learning (RL) for one-hot correctness inherently relies on the LM being able to explore and solve its task with some chance at initialization. Furthermore, a key use case of reasoning LMs is to act as teachers for distilling new students and cold-starting future RL iterations rather than being deployed themselves. From these considerations, we introduce a new framework that avoids RL's exploration challenge by training a new class of Reinforcement-Learned Teachers (RLTs) focused on yielding the most effective downstream distillation. RLTs are prompted with both the question and solution to each problem, and tasked to simply "connect-the-dots" with detailed explanations tailored for their students. We train RLTs with dense rewards obtained by feeding each explanation to the student and testing its understanding of the problem's solution. In practice, the raw outputs of a 7B RLT provide higher final performance on competition and graduate-level tasks than existing distillation and cold-starting pipelines that collect and postprocess the reasoning traces of orders of magnitude larger LMs. Furthermore, RLTs maintain their effectiveness when training larger students and when applied zero-shot to out-of-distribution tasks, unlocking new levels of efficiency and re-usability for the RL reasoning framework. Code available at: https://github.com/SakanaAI/RLT

large language model, machine learning, reinforcement learning, (21 more...)

arXiv.org Artificial Intelligence

2506.08388

Genre:

Research Report (0.82)
Instructional Material (0.67)

Industry: Education (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Reinforcement Learning (1.00)
(2 more...)

Add feedback

Pay Attention to the Triggers: Constructing Backdoors That Survive Distillation

De Muri, Giovanni, Vero, Mark, Staab, Robin, Vechev, Martin

arXiv.org Artificial IntelligenceOct-22-2025

LLMs are often used by downstream users as teacher models for knowledge distillation, compressing their capabilities into memory-efficient models. However, as these teacher models may stem from untrusted parties, distillation can raise unexpected security risks. In this paper, we investigate the security implications of knowledge distillation from backdoored teacher models. First, we show that prior backdoors mostly do not transfer onto student models. Our key insight is that this is because existing LLM backdooring methods choose trigger tokens that rarely occur in usual contexts. We argue that this underestimates the security risks of knowledge distillation and introduce a new backdooring technique, T-MTB, that enables the construction and study of transferable backdoors. T-MTB carefully constructs a composite backdoor trigger, made up of several specific tokens that often occur individually in anticipated distillation datasets. As such, the poisoned teacher remains stealthy, while during distillation the individual presence of these tokens provides enough signal for the backdoor to transfer onto the student. Using T-MTB, we demonstrate and extensively study the security risks of transferable backdoors across two attack scenarios, jailbreaking and content modulation, and across four model families of LLMs.

large language model, machine learning, natural language, (21 more...)

arXiv.org Artificial Intelligence

2510.18541

Country: Europe (0.46)

Genre: Research Report > New Finding (0.68)

Industry:

Information Technology > Security & Privacy (1.00)
Government (1.00)
Law Enforcement & Public Safety > Crime Prevention & Enforcement (0.74)
Education > Educational Technology > Educational Software (0.36)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Detecting Distillation Data from Reasoning Models

Zhang, Hengxiang, Choi, Hyeong Kyu, Li, Sharon, Wei, Hongxin

arXiv.org Artificial IntelligenceOct-16-2025

Reasoning distillation has emerged as an efficient and powerful paradigm for enhancing the reasoning capabilities of large language models. However, reasoning distillation may inadvertently cause benchmark contamination, where evaluation data included in distillation datasets can inflate performance metrics of distilled models. In this work, we formally define the task of distillation data detection, which is uniquely challenging due to the partial availability of distillation data. Then, we propose a novel and effective method T oken Probability Deviation (TBD), which leverages the probability patterns of the generated output tokens. Our method is motivated by the analysis that distilled models tend to generate near-deterministic tokens for seen questions, while producing more low-probability tokens for unseen questions. Our key idea behind TBD is to quantify how far the generated tokens' probabilities deviate from a high reference probability. In effect, our method achieves competitive detection performance by producing lower scores for seen questions than for unseen questions. Extensive experiments demonstrate the effectiveness of our method, achieving an AUC of 0.918 and a TPR@1% FPR of 0.470 on the S1 dataset. Large Reasoning Models (LRMs) have shown impressive performance on complex tasks like mathematical reasoning and coding problems (Jaech et al., 2024; Guo et al., 2025; Y ang et al., 2025; xAI, 2025). By articulating intermediate steps via Chain-of-Thought (CoT), LRMs dynamically allocate extra compute to challenging problems. However, such reasoning capabilities are typically limited to LRMs exceeding 100 billion parameters, hindering practical deployment in resource-constrained settings (Wei et al., 2022). To address this, recent studies have explored reasoning distillation, transferring reasoning abilities from LRMs to Small Language Models (SLMs) by simulating reasoning traces (Chen et al., 2025; Y e et al., 2025; Muennighoff et al., 2025b; Liu et al., 2025). This paradigm has been widely applied in cutting-edge models, such as DeepSeek R1 series (Guo et al., 2025), Sky-T1-32B-preview (Team, 2025), and Bespoke-32B (Labs, 2025). In reasoning distillation, current methods generate reasoning trajectories and answers from LRMs for domain-specific questions, using these to supervise SLM training (Wu et al., 2025b; Li et al., 2025).

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2510.0485

Genre: Research Report > New Finding (0.68)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.93)
(2 more...)

Add feedback

Ensemble Distillation for Robust Model Fusion in Federated Learning Tao Lin

Neural Information Processing SystemsOct-2-2025, 06:58:12 GMT

Federated Learning (FL) is a machine learning setting where many devices collab-oratively train a machine learning model while keeping the training data decentralized.

artificial intelligence, arxiv preprint arxiv, machine learning, (14 more...)

Neural Information Processing Systems

Country: North America > United States (0.46)

Industry:

Education (0.96)
Information Technology > Security & Privacy (0.68)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.96)

Add feedback

Review for NeurIPS paper: Ensemble Distillation for Robust Model Fusion in Federated Learning

Neural Information Processing SystemsJan-22-2025, 03:56:20 GMT

Strengths: This work manifests solid understanding of key requirements and challenges of federated learning, and thus presents a practical solution with significant improvements. The contribution of this paper is formulating a robust, efficient training scheme in FL with extensive results and analysis, which is relevant to the NeurIPS community. They provide sufficient justifications about why the additional computations are negligible in practice and why the reduced number of communication rounds and the ability to handle architecture heterogeneity of FedDF matter more. The authors analyzed its contribution from various angles including efficiency, utilizing heterogeneous computation resources of clients, robustness on the choice of distillation dataset, and handling heterogeneous client data by mitigating quality loss of batch normalization with different data distributions. The results are sensible and believable.

distillation dataset, ensemble distillation, robust model fusion, (6 more...)

Neural Information Processing Systems

Technology: Information Technology > Artificial Intelligence > Machine Learning (0.67)

Add feedback

Better Knowledge Enhancement for Privacy-Preserving Cross-Project Defect Prediction

Wang, Yuying, Li, Yichen, Wang, Haozhao, Zhao, Lei, Zhang, Xiaofang

arXiv.org Artificial IntelligenceDec-23-2024

Cross-Project Defect Prediction (CPDP) poses a non-trivial challenge to construct a reliable defect predictor by leveraging data from other projects, particularly when data owners are concerned about data privacy. In recent years, Federated Learning (FL) has become an emerging paradigm to guarantee privacy information by collaborative training a global model among multiple parties without sharing raw data. While the direct application of FL to the CPDP task offers a promising solution to address privacy concerns, the data heterogeneity arising from proprietary projects across different companies or organizations will bring troubles for model training. In this paper, we study the privacy-preserving cross-project defect prediction with data heterogeneity under the federated learning framework. To address this problem, we propose a novel knowledge enhancement approach named FedDP with two simple but effective solutions: 1. Local Heterogeneity Awareness and 2. Global Knowledge Distillation. Specifically, we employ open-source project data as the distillation dataset and optimize the global model with the heterogeneity-aware local model ensemble via knowledge distillation. Experimental results on 19 projects from two datasets demonstrate that our method significantly outperforms baselines.

artificial intelligence, data mining, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2412.17317

Country:

North America > United States (1.00)
Asia (0.67)

Genre:

Research Report > New Finding (1.00)
Research Report > Experimental Study (0.68)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (1.00)
Information Technology > Data Science > Data Mining > Big Data (0.62)
Information Technology > Artificial Intelligence > Machine Learning > Learning Graphical Models > Directed Networks > Bayesian Learning (0.46)

Add feedback

Improving Mathematical Reasoning Capabilities of Small Language Models via Feedback-Driven Distillation

Zhu, Xunyu, Li, Jian, Ma, Can, Wang, Weiping

arXiv.org Artificial IntelligenceNov-21-2024

Large Language Models (LLMs) demonstrate exceptional reasoning capabilities, often achieving state-of-the-art performance in various tasks. However, their substantial computational and memory demands, due to billions of parameters, hinder deployment in resource-constrained environments. A promising solution is knowledge distillation, where LLMs transfer reasoning capabilities to Small Language Models (SLMs, $\le$ 1B parameters), enabling wider deployment on low-resource devices. Existing methods primarily focus on generating high-quality reasoning rationales for distillation datasets but often neglect the critical role of data quantity and quality. To address these challenges, we propose a Feedback-Driven Distillation (FDD) framework to enhance SLMs' mathematical reasoning capabilities. In the initialization stage, a distillation dataset is constructed by prompting LLMs to pair mathematical problems with corresponding reasoning rationales. We classify problems into easy and hard categories based on SLM performance. For easy problems, LLMs generate more complex variations, while for hard problems, new questions of similar complexity are synthesized. In addition, we propose a multi-round distillation paradigm to iteratively enrich the distillation datasets, thereby progressively improving the mathematical reasoning abilities of SLMs. Experimental results demonstrate that our method can make SLMs achieve SOTA mathematical reasoning performance.

large language model, machine learning, natural language, (17 more...)

arXiv.org Artificial Intelligence

2411.14698

Country:

North America > Canada > Ontario > Toronto (0.04)
Asia > Thailand > Bangkok > Bangkok (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
(5 more...)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Cognitive Science > Problem Solving (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.46)

Add feedback

Leveraging Distillation Techniques for Document Understanding: A Case Study with FLAN-T5

Lamott, Marcel, Shakir, Muhammad Armaghan

arXiv.org Artificial IntelligenceSep-17-2024

The surge of digital documents in various formats, including less standardized documents such as business reports and environmental assessments, underscores the growing importance of Document Understanding. While Large Language Models (LLMs) have showcased prowess across diverse natural language processing tasks, their direct application to Document Understanding remains a challenge. Previous research has demonstrated the utility of LLMs in this domain, yet their significant computational demands make them challenging to deploy effectively. Additionally, proprietary Blackbox LLMs often outperform their open-source counterparts, posing a barrier to widespread accessibility. In this paper, we delve into the realm of document understanding, leveraging distillation methods to harness the power of large LLMs while accommodating computational limitations. Specifically, we present a novel approach wherein we distill document understanding knowledge from the proprietary LLM ChatGPT into FLAN-T5. Our methodology integrates labeling and curriculum-learning mechanisms to facilitate efficient knowledge transfer. This work contributes to the advancement of document understanding methodologies by offering a scalable solution that bridges the gap between resource-intensive LLMs and practical applications. Our findings underscore the potential of distillation techniques in facilitating the deployment of sophisticated language models in real-world scenarios, thereby fostering advancements in natural language processing and document comprehension domains.

dataset, proceedings, student model, (12 more...)

arXiv.org Artificial Intelligence

2409.11282

Country:

Asia > Pakistan > Islamabad Capital Territory > Islamabad (0.04)
North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
Europe > Italy > Piedmont > Turin Province > Turin (0.04)
(2 more...)

Genre: Research Report > New Finding (1.00)

Industry: Education (0.99)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Federated Learning with a Single Shared Image

Soni, Sunny, Saeed, Aaqib, Asano, Yuki M.

arXiv.org Artificial IntelligenceJun-18-2024

Federated Learning (FL) enables multiple machines to collaboratively train a machine learning model without sharing of private training data. Yet, especially for heterogeneous models, a key bottleneck remains the transfer of knowledge gained from each client model with the server. One popular method, FedDF, uses distillation to tackle this task with the use of a common, shared dataset on which predictions are exchanged. However, in many contexts such a dataset might be difficult to acquire due to privacy and the clients might not allow for storage of a large shared dataset. To this end, in this paper, we introduce a new method that improves this knowledge distillation method to only rely on a single shared image between clients and server. In particular, we propose a novel adaptive dataset pruning algorithm that selects the most informative crops generated from only a single image. With this, we show that federated learning with distillation under a limited shared dataset budget works better by using a single image compared to multiple individual ones. Finally, we extend our approach to allow for training heterogeneous client architectures by incorporating a non-uniform distillation schedule and client-model mirroring on the server side.

dataset, distillation, single image, (14 more...)

arXiv.org Artificial Intelligence

2406.12658

Country:

Europe > Netherlands > North Holland > Amsterdam (0.04)
North America > United States > Virginia (0.04)
Europe > Netherlands > North Brabant > Eindhoven (0.04)

Genre: Research Report (0.82)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Inductive Learning (0.31)

Add feedback